Gesamt

Column

Datenvollständigkeit

Datengenauigkeit

gesamte Datenqualität

Column

Trend der Fälle in Prozent

gesamte Fälle pro Jahr

Year Cases_per_year
1998 1317
1999 1337
2000 1405
2001 1248
2002 1320
2003 1089
2004 1328
2005 959
2006 1255
2007 1088
2008 1029
2009 669
2010 853
2011 796
2012 833
2013 824
2014 872
2015 897

Datenzusammenfassung

Column

Chart A

No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 Year [integer] mean (sd) : 2005.56 (5.16) min < med < max : 1998 < 2005 < 2015 IQR (CV) : 9 (0) 18 distinct values 19119 (100%) 0 (0%)
2 Month [character] 1. May 2. June 3. December 4. April 5. March 6. July 7. August 8. January 9. November 10. February [ 2 others ] 1898 (9.9%) 1819 (9.5%) 1816 (9.5%) 1725 (9.0%) 1724 (9.0%) 1538 (8.0%) 1533 (8.0%) 1515 (7.9%) 1509 (7.9%) 1448 (7.6%) 2594 (13.6%) 19119 (100%) 0 (0%)
3 State [character] 1. Florida 2. California 3. Ohio 4. Illinois 5. New York 6. Michigan 7. Minnesota 8. Washington 9. Maryland 10. Colorado [ 45 others ] 2409 (12.6%) 2293 (12.0%) 1288 (6.7%) 1133 (5.9%) 909 (4.8%) 882 (4.6%) 842 (4.4%) 824 (4.3%) 816 (4.3%) 510 (2.7%) 7213 (37.7%) 19119 (100%) 0 (0%)
4 Location [character] 1. Restaurant 2. Private Home/Residence 3. Catering Service 4. Banquet Facility 5. Fast Food Restaurant 6. Unknown 7. School/College/University 8. Grocery Store 9. Restaurant; Private Home/ 10. Prison/Jail [ 151 others ] 10448 (61.6%) 1681 (9.9%) 1089 (6.4%) 367 (2.2%) 366 (2.2%) 355 (2.1%) 354 (2.1%) 301 (1.8%) 205 (1.2%) 193 (1.1%) 1594 (9.7%) 16953 (88.67%) 2166 (11.33%)
5 Food [character] 1. Multiple Foods 2. Oysters, Raw 3. Ground Beef, Hamburger 4. Salad, Unspecified 5. Chicken, Unspecified 6. Sandwich, Submarine 7. Chicken, Other 8. Lettuce-Based Salads Unsp 9. Pizza, Unspecified 10. Pork, Bbq [ 3117 others ] 206 (2.0%) 165 (1.6%) 127 (1.2%) 118 (1.2%) 105 (1.0%) 85 (0.8%) 83 (0.8%) 81 (0.8%) 81 (0.8%) 80 (0.8%) 9025 (89.9%) 10156 (53.12%) 8963 (46.88%)
6 Ingredient [character] 1. Fin Fish 2. Chicken 3. Beef 4. Egg 5. Pork 6. Turkey 7. Leafy Green 8. Milk 9. Ground Beef 10. Rice [ 371 others ] 190 (10.1%) 179 (9.5%) 124 (6.6%) 119 (6.3%) 105 (5.6%) 64 (3.4%) 51 (2.7%) 41 (2.2%) 37 (2.0%) 35 (1.9%) 931 (48.9%) 1876 (9.81%) 17243 (90.19%)
7 Species [character] 1. Norovirus genogroup I 2. Salmonella enterica 3. Norovirus genogroup II 4. Norovirus unknown 5. Clostridium perfringens 6. Staphylococcus aureus 7. Escherichia coli, Shiga t 8. Scombroid toxin 9. Norovirus 10. Bacillus cereus [ 191 others ] 2744 (21.9%) 2303 (18.4%) 1424 (11.4%) 790 (6.3%) 732 (5.9%) 532 (4.3%) 485 (3.9%) 389 (3.1%) 334 (2.7%) 299 (2.4%) 2468 (20.0%) 12500 (65.38%) 6619 (34.62%)
8 Serotype/Genotype [character] 1. Unknown 2. Enteritidis 3. O157:H7 4. Typhimurium 5. Newport 6. Heidelberg 7. GII_4 Sydney (2012) 8. Javiana 9. Braenderup 10. GII_4 New Orleans (2009) [ 229 others ] 690 (17.7%) 686 (17.6%) 415 (10.6%) 282 (7.2%) 169 (4.3%) 156 (4.0%) 119 (3.0%) 69 (1.8%) 57 (1.5%) 54 (1.4%) 1210 (31.5%) 3907 (20.44%) 15212 (79.56%)
9 Status [character] 1. Confirmed 2. Suspected 3. Suspected; Suspected 4. Confirmed; Confirmed 5. Confirmed; Suspected 6. Confirmed; Confirmed; Con 7. Suspected; Confirmed 8. Confirmed; Suspected; Sus 9. Confirmed; Confirmed; Sus 10. Suspected; Suspected; Sus [ 12 others ] 7909 (63.3%) 4068 (32.5%) 310 (2.5%) 133 (1.1%) 32 (0.3%) 17 (0.1%) 7 (0.1%) 4 (0.0%) 3 (0.0%) 3 (0.0%) 14 (0.1%) 12500 (65.38%) 6619 (34.62%)
10 Illnesses [integer] mean (sd) : 19.54 (49.45) min < med < max : 2 < 8 < 1939 IQR (CV) : 16 (2.53) 302 distinct values 19119 (100%) 0 (0%)
11 Hospitalizations [integer] mean (sd) : 0.95 (5.31) min < med < max : 0 < 0 < 308 IQR (CV) : 1 (5.61) 61 distinct values 15494 (81.04%) 3625 (18.96%)
12 Fatalities [integer] mean (sd) : 0.02 (0.39) min < med < max : 0 < 0 < 33 IQR (CV) : 0 (17.82) 12 distinct values 15518 (81.17%) 3601 (18.83%)

Generated by summarytools 0.8.7 (R version 3.5.1)
2018-09-26

Datenvollständigkeit

Column

Missing values per Variable (in Percent)

Column

Missing values per State (in Percent)

Gesamt Datenvollständigkeit

97

Datengenauigkeit

Column

Anzahl der Erkrankten pro Ausbruch nach US Staat

Column

Chart B

Chart C

59
---
title: "DGEpi 2018 Dataquality Demo Dashboard"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    css: ["css/flex.css", "css/summarytools.css"]
    logo: pics/hzi_logo.png
    source_code: embed
    navbar:
      - { title: "Back to Talk", href: "index.html#23"}
---

```{r setup, include=FALSE}
library(flexdashboard)
library(plotly)
library(tidyverse)
library(summarytools)
library(formattable)
library(knitr)
library(kableExtra)
library(VIM)
library(lvplot)
df <- read_csv("data/outbreaks.csv")
```

Gesamt {data-navmenu="DQ Navigation" data-icon="fa-list" data-orientation=rows}
===

Column 
-----------------------------------------------------------------------

### Datenvollständigkeit

```{r}
DC <- 97
gauge(DC, min = 0, max = 100, symbol = '%', gaugeSectors(
  success = c(90, 100), warning = c(70, 89), danger = c(0, 69)
))
```

### Datengenauigkeit 

```{r}
DA <- 59
gauge(DA, min = 0, max = 100, symbol = '%', gaugeSectors(
  success = c(90, 100), warning = c(70, 89), danger = c(0, 69)
))
```

### gesamte Datenqualität

```{r}
DQ <- 85
gauge(DQ, min = 0, max = 100, symbol = '%', gaugeSectors(
  success = c(90, 100), warning = c(70, 89), danger = c(0, 69)
))
```

Column {data-width=350}
-----------------------------------------------------------------------

### Trend der Fälle in Prozent

```{r}
varis <- c("Florida", "California", "Ohio", "Illinois", "New York")
g2 <- df %>% 
  add_count(Year, State) %>% 
  add_count(State) %>% mutate(per = n/nn) %>% 
  select(Year, State, n, nn, per) %>% 
  filter(grepl(paste(varis, collapse = "|"), 
               State)) %>% 
  distinct() %>% 
  ggplot(aes(Year, per, color = State)) +
  geom_line() + theme_classic() + xlab("") +
  ylab("") + 
  scale_color_brewer(palette = "Set1", 
                     type ="qual") +
  
  scale_size_continuous(guide = "none")
ggplotly(g2) %>% 
  layout(legend = list(x = 0.75, y = 0.95))
```

### gesamte Fälle pro Jahr 

```{r}
df %>% 
  add_count(Year, State) %>% 
  add_count(State) %>%
  mutate(per = n/nn) %>% 
  select(Year, State, n, nn, per) %>% 
  distinct() %>% 
  group_by(Year) %>% 
  summarise(Cases_per_year = sum(n)) %>% 
  mutate(Cases_per_year = color_tile("white", "red")(Cases_per_year)) %>% 
  kable(escape = F, align = "c") %>%
  kable_styling(full_width = F)
```


Datenzusammenfassung {data-navmenu="DQ Navigation" data-icon="fa-list" data-orientation=columns}
===

Column {data-width=650}
-----------------------------------------------------------------------

### Chart A

```{r} print(dfSummary(df, style = "grid", plain.ascii = FALSE, graph.magnif = 0.85), method = "render", omit.headings = TRUE) ```
Datenvollständigkeit {data-navmenu="DQ Navigation" data-icon="fa-list" data-orientation=columns} === Column {data-width=650} ----------------------------------------------------------------------- ### Missing values per Variable (in Percent) ```{r} g <- df %>% summarise_all(funs(countNA)) %>% gather(Variable, missings) %>% mutate(miss_per = missings/nrow(df)*100) %>% ggplot(aes(reorder(Variable, miss_per), miss_per, fill = miss_per)) + geom_bar(stat = "identity") + scale_fill_viridis_c(option = "C", guide = "none") + coord_flip() + theme_classic() + xlab("") + ylab("") + ggtitle("Missing values per Variable (in Percent)") ggplotly(g) %>% layout(showlegend = FALSE) ``` Column {data-width=350} ----------------------------------------------------------------------- ### Missing values per State (in Percent) ```{r} g2 <- df %>% group_by(State) %>% summarise_all(countNA) %>% gather(Variable, missing, -State) %>% group_by(State) %>% summarise(miss = sum(missing)) %>% left_join(df %>% add_count(State) %>% mutate(n = n*(ncol(df)-1)) %>% select(State, n) %>% distinct()) %>% mutate(miss_per_s = round((miss/n*100), digits = 1)) %>% filter(miss_per_s >= 31.6) %>% ggplot(aes(reorder(State, miss_per_s), miss_per_s, fill = miss_per_s)) + geom_bar(stat = "identity") + scale_fill_viridis_c(option = "C", guide = "none") + coord_flip() + theme_classic() + xlab("") + ylab("") + ggtitle("Missing values per State (in Percent)") ggplotly(g2) %>% layout(showlegend = FALSE) ``` ### Gesamt Datenvollständigkeit ```{r} DC <- 97 valueBox(DC, icon = "fa-thumbs-up", color = "green") ``` Datengenauigkeit {data-navmenu="DQ Navigation" data-icon="fa-list" data-orientation=columns} === Column {data-width=650} ----------------------------------------------------------------------- ### Anzahl der Erkrankten pro Ausbruch nach US Staat ```{r} df %>% select(State, Illnesses) %>% add_count(State) %>% filter(n >= 1000) %>% ggplot(aes(State, Illnesses)) + geom_lv( outlier.colour = "red", fill = "#005aa0", color = "#005aa0") + theme_classic() + xlab("") + ylab("") ``` Column {data-width=350} ----------------------------------------------------------------------- ### Chart B ```{r} test <- df %>% select(Year, Ingredient, Species, Status) %>% as.matrix() pbox(test) ``` ### Chart C ```{r} DC <- 59 valueBox(DC, icon = "fa-thumbs-down", color = "red") ```